Exact correspondence between walk in nucleotide and protein sequence spaces
نویسنده
چکیده
In the course of evolution, genes traverse the nucleotide sequence space, which translates to a trajectory of changes in the protein sequence in protein sequence space. The correspondence between regions of the nucleotide and protein sequence spaces is understood in general but not in detail. One of the unexplored questions is how many sequences a protein can reach with a certain number of nucleotide substitutions in its gene sequence. Here I propose an algorithm to calculate the volume of protein sequence space accessible to a given protein sequence as a function of the number of nucleotide substitutions made in the protein-coding sequence. The algorithm utilizes the power of the dynamic programming approach, and makes all calculations within a couple of seconds on a desktop computer. I apply the algorithm to green fluorescence protein, and get the number of sequences four times higher than estimated before. However, taking into account the astronomically huge size of the protein sequence space, the previous estimate can be considered as acceptable as an order of magnitude estimation. The proposed algorithm has practical applications in the study of evolutionary trajectories in sequence space.
منابع مشابه
Study on Genetic Diversity of Terminal Fragment Sequence of Isolated Persian Tobacco Mosaic Virus
Tobacco mosaic virus (TMV) is one of the devastating plant viruses in the world that infects more than 200 plant species. Movement protein plays a supportive role in the movement of other plant viruses, and viral coat protein is highly expressed in infected plants and affects replication and movements of TMV. In order to investigate genetic variation in the terminal fragment sequence in Iranian...
متن کاملNucleotide Sequence of Gene Encoding Capsid Protein VPI of Foot-and-Mouth Disease Virus / Type O1 Iran
متن کامل
Genetic Analysis of D-Loop Region of Mitochondrial DNA Sequence in Iranian Patients with Familial Adenomatous Polyposis (FAP): A Case-Control Study
Background and Objectives: Familial adenomatous polyposis (FAP) is an inherited disorder and a rare form of colorectal cancer. This disease appears equally in both sexes and its occurrence is more in the second or third decade of life. Mutations and alterations of the mitochondrial genome, especially the D-loop region, have been reported in various human tumors. But the exact role of these muta...
متن کاملNew characterizations of fusion bases and Riesz fusion bases in Hilbert spaces
In this paper we investigate a new notion of bases in Hilbert spaces and similar to fusion frame theory we introduce fusion bases theory in Hilbert spaces. We also introduce a new denition of fusion dual sequence associated with a fusion basis and show that the operators of a fusion dual sequence are continuous projections. Next we dene the fusion biorthogonal sequence, Bessel fusion basis, Hil...
متن کاملIntraspecies Gene Variation within Putative Epitopes of Immunodominant Protein P48 of Mycoplasma agalactiae
P48 protein of Mycoplasma agalactiae is used to diagnose infection and was identified as potential vaccine candidate. According to the genetic nature of mycoplasma and variable sensitivity in P48-based serological diagnosis tests, intra species variation of P48 nucleotide sequence investigated in 13 field isolates of difference province of Iran along with three vaccine strains. Samples were col...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 12 شماره
صفحات -
تاریخ انتشار 2017